arXiv:1509.03795vl [physics.soc-ph] 13 Sep 2015 


Peer pressure: enhancement of cooperation through mutual punishment 


Han-Xin Yang,' El Zhi-Xi Wu,^ Zhihai Rong,^ and Ying-Cheng Lai"' 

^Department of Physics, Fuzhou University, Fuzhou 350108, China 
^Institute of Computational Physics and Complex Systems, Lanzhou University, Lanzhou, Gansu 730000, China 
^Web Sciences Center, University of Electronic Science and Technology of China, Chengdu 610054, China 
"^School of Electrical, Computer and Energy Engineering, Arizona State University, AZ 85287, USA 

(Dated: September 15, 2015) 

An open problem in evolutionary game dynamics is to understand the effect of peer pressure on cooperation 
in a quantitative manner. Peer pressure can be modeled by punishment, which has been proved to be an effective 
mechanism to sustain cooperation among selfish individuals. We investigate a symmetric punishment strategy, 
in which an individual will punish each neighbor if their strategies are different, and vice versa. Because of 
the symmetry in imposing the punishment, one might expect intuitively the strategy to have little effect on 
cooperation. Utilizing the prisoner’s dilemma game as a prototypical model of interactions at the individual 
level, we find, through simulation and theoretical analysis, that proper punishment, when even symmetrically 
imposed on individuals, can enhance cooperation. Besides, we find that the initial density of cooperators plays 
an important role in the evolution of cooperation driven by mutual punishment. 

PACS numbers: 02.50.Le, 87.23.Kg, 87.23.Ge 


I. INTRODUCTION 


Cooperation is ubiquitous in biological, social and econom¬ 
ical systems iQl]. Understanding and searching for mecha¬ 
nisms that can generate and sustain cooperation among selfish 
individuals remains to be an interesting problem. Evolution¬ 
ary game theory represents a powerful mathematical frame¬ 
work to address this problem P,!!!]. Previous theoretical |0- 
[H and experimental dfil studies showed that, for evo¬ 
lutionary game dynamics in spatially extended systems, pun¬ 
ishment is an effective approach to enforcing the cooperative 
behavior, where the punishment can be imposed on either co- 
operators or defectors. The agents that get punished bear a 
fine while the punisher pays for the cost of imposing the pun¬ 
ishment 11^1^ . In existing studies, individuals who hold a 
specific strategy (usually defection) are punished. 

In realistic situations, punishment can be mutual and the 
strategy would typically depends on the surrounding environ¬ 
ment, e.g., on neighbors’ strategies. An example is “peer 
pressure.” Previous psychological experiments demonstrated 
that, an individual tends to conglomerate (fit in) with others 
in terms of behaviors or opinions ll^ . Dissent often leads to 
punishment either psychologically or financially, or both, as 
human individuals attempt to attain social conformity modu¬ 
lated by peer pressure 11221 - 1^ . To understand quantitatively 
the effect of peer pressure on cooperation through developing 
and analyzing an evolutionary game model is the main goal of 
this paper. In particular, we propose a mechanism of punish¬ 
ment in which an individual will punish neighbors who hold 
the opposite strategy, regardless of whether they are coopera¬ 
tors or defectors. 

Differing from previous models where additional strategies 
of punishment were introduced, in our model there are only 
two strategies (pure cooperators and pure defectors). More 
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importantly, the punishment is mutual in our model, i.e., indi¬ 
vidual i who punishes individual j is also punished by j, so 
the cost of punishment can be absorbed into the punishment 
fine. Because of this symmetry at the individual or “micro¬ 
scopic” level, intuitively one may expect the punishment not 
to have any effect on cooperation. Surprisingly, we find that 
symmetric punishment can lead to enhancement of coopera¬ 
tion. We provide computational and heuristic arguments to 
establish this finding. 


II. MODEL 


Without loss of generality, we use and modify the classic 
prisoner’s dilemma game (PDG) to construct a model to 
gain quantitative understanding of the effect of peer pressure 
on cooperation by incorporating our symmetric punishment 
mechanism. In the original PDG, two players simultaneously 
decide whether to cooperate or defect. They both receive pay¬ 
off R upon mutual cooperation and payoff P upon mutual de¬ 
fection. If one cooperates but the other defects, the defector 
gets payoff T while the cooperator gains payoff S. The payoff 
rank for the PDG is T>R>P>S. As a result, in a single 
round of PDG, mutual defection is the best strategy for both 
players, generating the well-known social dilemma. There are 
different settings of payoff parameters 1^12^ . For computa¬ 
tional convenience the parameters are often rescaled as 
T = 6 > 1, i? = 1, and P = S = 0, where b denotes the 
temptation to defect. 

In their pioneering work, Nowak and May included spa¬ 
tial structure into the PDG in which individuals play 
games only with their immediate neighbors. In the spatial 
PDG, cooperators can survive by forming clusters in which 
mutual cooperation outweigh the loss against defectors & 
[^ . In the past decade, the PDG has been extensively stud¬ 
ied f or popu lations on various types of network configura¬ 
tions ImU^ . including regular lattices! 36;^, small-world 
networks iTolIdd]] . scale-free networks mm, dynamic net- 
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works and interdependent networks if^ . 

Our model is constructed, as follows. Player x can take 
one of two strategies; cooperation or defection, which are de¬ 
scribed by 



respectively. At each time step, each individual plays the 
PDG with its neighbors. An individual will punish the neigh¬ 
bors that hold different strategies. The accumulated payoff of 
player x can thus be expressed as 

Px= [slMsy - q;( 1 - s^Sy)], (2) 
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where the sum runs over the nearest neighbor set of player 
X, a is the punishment fine, and M is the rescaled payoff ma¬ 
trix given by 


FIG. 1: (Color online) Asymptotic density of cooperators pc. as a 
function of the punishment fine a for different values of the initial 
density of cooperators po. The temptation to defect h = 1.5. 


M=(J“). (3) 

Initially, the cooperation and the defection strategies are 
randomly assigned to all individuals in terms of some prob¬ 
abilities: the initial densities of cooperators and defectors are 
set to be po 1 — po respectively. The update of strategies 
is based on the replicator equation iH for well-mixed popu¬ 
lations and the Fermi rule 11^ for structured populations. 


III. RESULTS FOR WELL-MIXED POPULATIONS 

In the case of well-mixed populations, i.e., a population 
with no structure, where each individual plays with every 
other, the evolutionary dynamics is determined by the repli¬ 
cation equation of the fraction of the cooperators p in the pop¬ 
ulation isU]: 


^ = p(l-p)(P,-P,), (4) 

where Pc = p—(l — p)a is the rescaled payoff ofacooperator 
and Pd = pb — pa is the rescaled payoff of a defector. The 
equilibria of p can be obtained by setting dp/dt = 0. There 
exists a mixed equilibrium 


a 

P-= 2a + l-b^ 


(5) 


which is unstable. Provided that the initial density of cooper¬ 
ators Po is different from 0 and 1, the asymptotic density of 
cooperators pc = 1 if po > Pe. and pc = 0 if po < Pe- 

Figure[T]shows the asymptotic density of cooperators pc as 
a function of the punishment fine a for different values of the 
initial density of cooperators po when the temptation to defect 
b = 1.5. From Eq. ©, we note that the mixed equilibrium 
Pe definitely exceeds 0.5. As a result, for po < 0.5, Pc is 
always zero regardless of the values of the temptation to defect 
and the punishment fine. However, for 0.5 < po < 1, there 




FIG. 2: (a) The critical value of the punishment fine Oc as a function 
of the temptation to defect b. The initial density of cooperators po = 
0.6. (b) The dependence of Oc on po. The temptation to defect 
b = 1.5. 


exist a critical value of the punishment fine (denoted by ac), 
below which cooperators die out while above which defectors 
become extinct. According to Eq. (|5]), we obtain ac as 


ac 


{b - l)po 

2po — 1 


( 6 ) 


Eor example, ac = 15 when po = 0.6 and b = 1.5. EromEq. 
©, one can find that ac increases as the temptation to defect b 
increases but it decreases as the initial density of cooperators 
Po increases, as shown in Eig.|2] 


IV. RESULTS FOR STRUCTURED POPULATIONS 


In a structured population, each individual plays the game 
only with its immediate neighbors. Without loss of general¬ 
ity, we study the evolution of cooperation on a square lattice. 












3 



FIG. 3: (Color online) Fraction of cooperators pc as a function of b, 
the temptation to defect, for different values of the punishment fine 

a. 



a a 

FIG. 4: (Color online) Fraction of cooperators pc as a function of the 
punishment fine a for different values of b. The results in (a) and (b) 
from simulation and theoretical analysis, respectively. 


which is the simple and widely used spatial structure. In the 
following, we use a 100 x 100 square lattice with periodic 
boundary conditions. We find that the results are qualitatively 
unchanged for larger system size, e.g., 200 x 200 lattice. 

In the following studies, we set the initial density of co- 
operators po = 0.5 without special mention. Players asyn¬ 
chronous update their strategies in a random sequential or¬ 
der 0521 - 1^ . Firstly, player x is randomly selected who ob¬ 
tains the payoff according to Eq. (Ell. Next, player x 
chooses one of its nearest neighbors at random, and the cho¬ 
sen neighbor y also acquires its payoff Py by the same rule. 
Finally, player x adopts the neighbor’s strategy with the prob¬ 
ability 


(Sx ^ Sy) 


1 

1 -F exp[-{Py - Px)/K] ’ 


(7) 


where parameter K characterizes noise or stochastic factors to 
permit irrational choices. Following previous studies lf52l - l53] . 
we set the noise level to be A' = 0.1. (Different choices of K, 
e.g., K = 0.01 and A' = 1, do not affect the main results.) 

The key quantity to characterize the cooperative behavior 
of the system is the fraction of cooperators pc in some steady 
state. All simulations are run for 30000 time steps to ensure 
that the system reaches a steady state, and pc is obtained by 
averaging over the last 2,000 time steps. Each time step con¬ 
sists of on average one strategy-updating event for all players. 
Each data point is obtained by averaging the fraction over 200 
different realizations. 

Figure [3 shows the fraction of cooperators pc as a func¬ 
tion of b, the temptation to defect, for different values of the 
punishment fine a. We observe, for any given value of a, 
a monotonic decrease in pc as b is increased. In addition, we 
find that pc can never reach unity in the whole range of b when 
the punishment fine is zero. However, for certain values of a, 
e.g., a = 0.5 and a = 0.8, cooperators can dominate the 
whole system for b below some critical value. 

Figure 0] shows pc as a function of a for different values of 
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FIG. 5: (Color online) Color coded map of the fraction of cooperators 
pc in the parameter plane (a,b). 


b. We see that, for relatively small values of b (e.g., b = 1.01), 
Pc increases with a. However, for larger values of b (e.g., 
b = 1.1 or & = 1.2), there exists an optimal region of a in 
which full cooperation (pc = 1) is achieved. For example, the 
optimal region in a is approximately [0.3, 0.8] and [0.4,0.6] 
for 6=1.1 and 6 = 1.2 respectively. The optimal value of 
a is moderate, indicating that either minor or harsh punish¬ 
ment does not promote cooperation. The dependence of pc on 
a can be qualitatively predicted analytically through a pair- 
approximation analysis if^ 1^ . the results from which are 
shown in Fig.|4|b). 

To quantify the ability of punishment fine a to promote co¬ 
operation for various values of 6 more precisely, we compute 
the behavior of pc in the parameter plane (a, 6), as shown in 
Fig.E] We see that, for 6 < 1.02, pc increases to unity as a 
is increased. For 1.02 < 6 < 1.27, there exists an optimal 
region of a in which complete extinction of defectors occurs 
{pc = 1). The optimal region of a becomes narrow as 6 is 
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FIG. 6: (Color online) For b = 1.01, time series of the fraction of 
cooperators, pc{t), for different values of a. The inset presents the 
convergence time tc versus a. 
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FIG. 7: (Color online) For b = 1.2, time series Pc{t) for different 
values of a. Inset shows that the fraction of cooperators decays ex¬ 
ponentially for q: = 0 and a — 1.5. 


increased. For b > 1.27, there also exists an optimal value of 
a that results in the highest possible level of cooperation for 
the corresponding b values, albeit Pc < 1- 

To gain insights into the mechanism of cooperation en¬ 
hancement through punishment, we examine the time evolu¬ 
tion of Pc for a number of combinations of the parameters a 
and b. Figure |6] shows the time series Pc{t) for different val¬ 
ues of a and a relatively small value of b (e.g., b = 1.01). 
In every case, Pc{t) decreases initially but then increases to a 
constant value. The similar phenomenon was also observed 
in Refs. 1?^ . For small values of a (e.g., a = 0 or 
a = 0.05), Pc(i) cannot reach unity. For relatively large val¬ 
ues of a (e.g., a = 0.15, a = 0.5 or a = 1.5), at the end 
defectors are extinct and all individuals are cooperators. We 
define the convergence time tc as the number of time steps 
required for complete extinction of defectors. In the inset of 
Fig. |6] we show tc as a function of a and observe that tc is 
minimized for ol ~ 0.5. 

Figure |7] shows the time series Pc(t) for different values of 
a when there is strong temptation to defect (e.g., b = 1.2). 
We observe that cooperators gradually die out for either small 
(e.g., a = 0) or large (e.g., ol = 1.5) ol values. A remarkable 
phenomenon is that, asymptotically, the fraction of cooper¬ 
ators decreases exponentially over time for small or large ol 
values; Pc(i) oc 6“*/°", where the value of r depends on a, as 
shown in the inset of Fig. |7] For moderate values of a (e.g., 
a = 0.5), Pc(l) decreases initially and then increases to unity. 

How the cooperators and defectors are distributed in the 
physical space when a steady state is reached? Figure[8]shows 
spatial strategy distributions for different values of the punish¬ 
ment fine a in the equilibrium state. By varying the value of 
b, we produce the same fraction of cooperators (pc = 0.8) for 
each value of a. We see that, defectors spread homogeneously 
in the whole space when a is small (e.g., a = 0.02), while the 
same amount of defectors are more condensed for the higher 
value of a (e.g., a = 0.4). Such condensation of defectors 



(a) (b) (c) 


FIG. 8: (Color online) For a number of values of a, snapshots of 
typical distributions of cooperators (blue) and defectors (red) in the 
steady state. The fraction of cooperators in the equilibrium state is 
set to be pc = 0.8 for different values of a. The values of a and b are 
(a) a = 0.02, b = 1.001; (b) a = 0.2, b = 1.116 and (c) a = 0.4, 
b = 1.245. 


prevents them to reach competitive payoffs. 

How does the distribution of cooperators and defectors 
evolve with time? Figure |9] shows the distribution of coopera¬ 
tors and defectors at different time steps for a large value of b 
(e.g., b = 1.2) and a moderate value of a (e.g., a = 0.5). Ini¬ 
tially, cooperators and defectors are randomly distributed with 
equal probability [Fig.|9|a)]. After a few time steps, coopera¬ 
tors and defectors are clustered, and the density of cooperators 
is lower than that associated with the initial state [Fig.|9jb)]. 
With time the cooperator clusters continue to expand and the 
defector clusters shrink [Fig. |9|c)]. Finally, the whole pop¬ 
ulation is cooperators [Fig.|9jd)]. From Fig. |9l one can also 
observe that interfaces separating domains of cooperators and 
defectors become smooth as time evolves. As illustrated in 
Refs. noisy borders are beneficial for defectors, while 

straight domain walls help cooperators to spread. 

In the above studies, we set the initial density of cooperators 
Pq to be 0.5. Now we study how different values of po affect 
the evolution of cooperation. From Fig. [TOl'a). one can find 
that for the small value of po (e.g., po = 0.2), the cooperation 
level reaches maximum at moderate punishment fine when the 
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Time 



FIG. 9: (Color online) For a = 0.5 and b — 1.2, snapshots of typical distributions of cooperators (blue) and defectors (red) at different time 
steps t. 



a a 

FIG. 10: (Color online) Fraction of cooperators pc as a function of 
the punishment fine a for different values of the temptation to de¬ 
fect b. The initial density of cooperators po is (a) 0.2 and (b) 0.8, 
respectively. 


temptation to defect b is fixed. However, for the large value of 
Po (e.g., Po = 0.8), the cooperation level increases to 1 as the 
punishment fine increases [Fig. fTOl' bll. 

V. CONCLUSIONS AND DISCUSSIONS 

To obtain quantitative understanding of the role of peer 
pressure on cooperation, we study evolutionary game dynam¬ 
ics and propose the natural mechanism of mutual punishment 
in which an individual will punish a neighbor with a fine if 
their strategies are different, and vice versa. The mutual pun¬ 
ishment can be interpreted as a term modifying the strength 
of coordination type interaction Because of the symme¬ 
try in imposing the punishment between the individuals, one 
might expect that it would have little effect on cooperation. 
However, we find a number of counterintuitive phenomena. 

In a well-mixed population, if the initial density of coop¬ 
erators is no more than 0.5, cooperators die out regardless of 


the values of the punishment fine and the temptation to de¬ 
fect. If the initial density of cooperators exceeds 0.5, for each 
value of the temptation to defect, there exists a critical value 
of the punishment fine, below (above) which is the full defec¬ 
tion (cooperation). The critical value of the punishment fine 
increases as the temptation to defect increases but it decreases 
as the initial density of cooperators increases. 

For structured population, our main findings are as follows, 
(i) If the initial density of cooperators is small (e.g., 0.2), there 
exists an optimal value of the punishment fine, leading to the 
highest cooperation. Too weak or too harsh punishment will 
suppress cooperation. Similar phenomenon was also observed 
in Refs. 11^ roll] , (ii) If the initial density of cooperators is 
moderate (e.g., 0.5), for weak temptation to defect, the final 
fraction of cooperators increases to 1 as the punishment fine 
increases. For strong temptation to defect, the cooperation 
level can be maximized for moderate punishment fine, (iii) If 
the initial density of cooperators is large (e.g., 0.8), for each 
value of the temptation to defect, the final fraction of cooper¬ 
ators increases to 1 as the punishment fine increases. 

In the present studies, we use the prisoner’s dilemma game 
to understand the role of peer pressure in cooperation. It 
would be interesting to explore the effect of mutual punish¬ 
ment on other types of evolutionary games (e.g., the snow¬ 
drift game and the public goods game) in future work. By 
our mechanism, an individual can be punished least by tak¬ 
ing the local majority strategy. In fact, following the majority 
is an important mechanism for the formation of public opin¬ 
ion ih^l . As a side result, our work provides a connection 
between the evolutionary games and opinion dynamics. 
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